library(tidyverse)
library(igraph)
library(tidygraph)
library(ggraph)Analyzing Retweet Dynamics of German MPs in the 19th Legislative Period
Assignment 2: Data Analysis Exercise
Load and Manipulate Dataset
In my Master’s Thesis, I scale German MPs (MdBs) of the 19th legislative period across ideological dimensions that are determined in an unsupervised way by using parliamentary speeches and Tweets. While retweet dynamics do not play a big role in the scaling algorithm per se, party affiliation does and is particularly interesting for network analytical approaches, especially in the realm of clustering.
tweet_df_final <- readRDS("data/tweet_df_final_2.rds")
# further meta information
od_tw_info <- read_csv("data/od_tw_merge_final.csv")How many Tweets are there in total? Over 1.3 Million - quite a lot.
tweet_df_final %>%
nrow()[1] 1301427
How many Twitter users are there in total across all MPs of the 19th legislative period?
tweet_df_final %>%
select(author_id, tw_id) %>%
distinct() %>%
nrow()[1] 515
Create data.frame that exhibits information of referenced Tweets
retweet_df <- tweet_df_final %>%
select(referenced_tweets, tweet_type_character, tweet_type_final, id) %>%
mutate(
# type of referenced tweet
ref_tweet_types = map(1:length(referenced_tweets),
~ referenced_tweets[[.x]][["type"]]),
# id of referenced tweet (NOT id of user)
ref_tweet_id = map(1:length(referenced_tweets),
~ referenced_tweets[[.x]][["id"]])
) Create final data.frame that only consists of retweets that occured within the dataset
Filter for retweets and merge author information from retweeted Tweets.
# vector with IDs from Tweets that are from authors within the dataset
tweet_ids <- tweet_df_final$id
retweet_within_df <- retweet_df %>%
filter(ref_tweet_types == "retweeted") %>%
filter(ref_tweet_id %in% tweet_ids) %>%
left_join(tweet_df_final %>% select(id, author_id, tw_id)) %>%
# directedness: from (orginal tweet) -> to (retweeted tweet)
left_join(author_id_ref_tweet, by = "id", suffix = c("_to", "_from"))How many Tweets are left after constructing the “within-retweet data.frame”? Just about 100000 or 7.7% of all Tweets.
retweet_within_df %>% nrow()[1] 100311
paste0((nrow(retweet_within_df) / nrow(tweet_df_final) * 100) %>% round(1), "%")[1] "7.7%"
Validate Retweet-Network Structure
retweet_within_df %>% select(tw_id_from, tw_id_to) %>% head(5) tw_id_from tw_id_to
1 katjakipping dorisachelwilm
2 victorperli dorisachelwilm
3 niemamovassat dorisachelwilm
4 michel_brandt_ dorisachelwilm
5 dietmarbartsch dorisachelwilm
This makes actually sense! Doris Achelwilm (MP from Left Party) retweets Katja Kipping and Dietmar Bartsch, which do both play a central role in the Left Party: They were party leaders in the 19th legislative period.
Plot most retweeted MPs
First, let us create a vector with party colors.
party_colors <- c("#138BD8", "#000000", "#FFEE0A", "#529222", "#AE1862", "#E30019")
names(party_colors) <- c("AfD", "CDU/CSU", "FDP", "Grüne", "DIE LINKE.", "SPD")We also create a ggplot2 theme, which is based on theme_bw(), but has some refinements.
theme_custom <- function() {
font <- "Corbel"
# base theme
theme_bw() %+replace%
theme(
text = element_text(family = font),
legend.text = element_text(size = 9),
legend.title = element_blank(),
legend.box.background = element_rect(
colour = "black",
fill = "white",
linetype = "solid"
),
# grid lines
panel.grid.major = element_line(color = "grey60", size = 0.2),
panel.grid.minor = element_line(color = "grey80", size = 0.1),
# faceting
strip.background = element_blank(),
strip.text = element_text(color = "black"),
)
}Now, we set the new theme as default.
theme_set(theme_custom())We also create a data.frame that just includes the party affiliation per MP.
party_user <- tweet_df_final %>%
distinct(tw_id, party)Plot Top 20
Now, we count how often per MP a Tweet was retweeted by another MP. We plot the to 20 MPs in total and visualize the party affiliation.
retweet_within_df %>%
group_by(tw_id_from) %>%
count() %>%
ungroup() %>%
arrange(desc(n)) %>%
top_n(20) %>%
left_join(party_user, by = c("tw_id_from" = "tw_id")) %>%
ggplot(aes(x = n, y = reorder(tw_id_from, n), fill = party)) +
geom_col() +
scale_fill_manual(values = party_colors) +
labs(y = "", title = "Most Prominent MPs by Retweets of other MPs") +
theme_custom()Interestingly, the AfD and FDP dominate the chart, while there is no single SPD MP within the top 20 most retweeted MPs within. Let’s check the pattern and plot by party! Overall, the results seem to make sense, given that party leaders and influential MPs are the most retweeted MPs according to the bar plot.
Plot Top 20 by Party
Now, let’s turn the attention to the most retweeted MPs by party. Both CDU/CSU and SPD, the biggest parties and also only members of the governing coalition of the 19th legislative period do not have as many influential MPs, measured by number of retweets across their colleagues, as opposition parties have.
retweet_within_df %>%
left_join(party_user, by = c("tw_id_from" = "tw_id")) %>%
filter(!party %in% c("Fraktionslos", NA)) %>%
group_by(tw_id_from, party) %>%
count() %>%
group_by(party) %>%
arrange(desc(n)) %>%
top_n(10) %>%
ggplot(aes(x = n, y = reorder(tw_id_from, n), fill = party)) +
geom_col() +
facet_wrap(~ party, scales = "free_y", ncol = 2) +
scale_fill_manual(values = party_colors) +
labs(y = "", title = "Most Prominent MPs by Retweets of other MPs", subtitle = "Grouped by Party") +
theme_custom()Construct Retweet-Network
In a Retweet-Network each retweet is represented as a directed link in a network that connects from the original Tweet to the referenced retweet. Users that are often retweeted within the network are expected to be associated with some degree of (digital) political relevance, even though there a lot of potential confounding variables (e.g. number of Tweets, number of Followers, general social media behaviour and association with socio-demographic factors such as cohort affiliation).
retweet_network <- retweet_within_df %>%
select(from = tw_id_from, to = tw_id_to) %>%
as_tbl_graph(directed = TRUE) %>%
left_join(party_user, by = c("name" = "tw_id"))
# get corresponding row ID (from ID)
row_id_from <- retweet_network %>%
as_tibble() %>%
mutate(id_from = row_number())
# update retweet network with edge information: name and party of retweeted users (from)
retweet_network <- retweet_network %>%
activate(edges) %>%
left_join(row_id_from, by = c("from" = "id_from")) %>%
rename(party_from = party, name_from = name) %>%
activate(nodes)
retweet_network# A tbl_graph: 487 nodes and 100311 edges
#
# A directed multigraph with 1 component
#
# Node Data: 487 × 2 (active)
name party
<chr> <chr>
1 katjakipping DIE LINKE.
2 victorperli DIE LINKE.
3 niemamovassat DIE LINKE.
4 michel_brandt_ DIE LINKE.
5 dietmarbartsch DIE LINKE.
6 susanneferschl DIE LINKE.
# … with 481 more rows
#
# Edge Data: 100,311 × 4
from to name_from party_from
<int> <int> <chr> <chr>
1 1 44 katjakipping DIE LINKE.
2 2 44 victorperli DIE LINKE.
3 3 44 niemamovassat DIE LINKE.
# … with 100,308 more rows
Degree Centrality
Out-degree
In the previous plots, we already calculated the out-degree centrality (which is the same as the number of outgoing edges per node or the number of original Tweets that have been retweeted within). Nevertheless, double-checking is always a good practice!
# %N>% activate nodes (tidygraph syntax)
retweet_network %N>%
mutate(
# in_degree = centrality_degree(mode = "in"),
out_degree = centrality_degree(mode = "out")
) %>%
arrange(desc(out_degree)) %>%
select(name, out_degree, party) %>%
as_tibble() %>%
head(20) %>%
knitr::kable()| name | out_degree | party |
|---|---|---|
| c_lindner | 4066 | FDP |
| udohemmelgarn | 3983 | AfD |
| alice_weidel | 3853 | AfD |
| frank_pasemann | 3794 | AfD |
| brihasselmann | 2475 | Grüne |
| marcobuschmann | 2294 | FDP |
| schneider_afd | 1830 | AfD |
| beatrix_vstorch | 1554 | AfD |
| gtzfrmming | 1330 | AfD |
| dietmarbartsch | 1208 | DIE LINKE. |
| m_reichardt_afd | 1164 | AfD |
| konstantinkuhle | 1162 | FDP |
| olliluksic | 1154 | FDP |
| paulziemiak | 1022 | CDU/CSU |
| joanacotar | 1021 | AfD |
| konstantinnotz | 956 | Grüne |
| renner_afd | 933 | AfD |
| nicole_hoechst | 921 | AfD |
| jensspahn | 901 | CDU/CSU |
| johannesvogel | 888 | FDP |
Now, let’s plot the distribution of the out-degree. There are a few outliers at the upper end.
retweet_network %N>%
mutate(
# in_degree = centrality_degree(mode = "in"),
out_degree = centrality_degree(mode = "out")
) %>%
as_tibble() %>%
ggplot(aes(x = out_degree)) +
geom_histogram() +
labs(x = "Out-Degree") +
theme_custom()The out-degree distribution is quite comparable across parties: Most MPs were never retweeted, but have important retweet hubs. However, for the Green and Left Party, the distribution seems to be a bit less skewed.
retweet_network %N>%
filter(!party %in% c("Fraktionslos", NA)) %>%
mutate(
# in_degree = centrality_degree(mode = "in"),
out_degree = centrality_degree(mode = "out")
) %>%
as_tibble() %>%
ggplot(aes(x = out_degree, fill = party)) +
geom_histogram() +
labs(x = "Out-Degree") +
facet_wrap(~ party, scales = "free") +
scale_fill_manual(values = party_colors) +
theme_custom()In-degree
The in-degree corresponds to the “inverse retweet behaviour”: It indicates which MPs retweeted Tweets of other MPs.
# %N>% activate nodes (tidygraph syntax)
retweet_network %N>%
mutate(
in_degree = centrality_degree(mode = "in"),
# out_degree = centrality_degree(mode = "out")
) %>%
arrange(desc(in_degree)) %>%
select(name, in_degree, party) %>%
as_tibble() %>%
head(20) %>%
knitr::kable()| name | in_degree | party |
|---|---|---|
| udohemmelgarn | 6002 | AfD |
| schneider_afd | 5001 | AfD |
| frank_pasemann | 3583 | AfD |
| martinrosemann | 3409 | SPD |
| renner_afd | 3270 | AfD |
| olliluksic | 2140 | FDP |
| nicole_hoechst | 1735 | AfD |
| drandreasnick | 1567 | CDU/CSU |
| stbrandner | 1519 | AfD |
| rkiesewetter | 1505 | CDU/CSU |
| reinholdmdb | 1494 | FDP |
| c_jung77 | 1284 | FDP |
| k_sa | 1225 | Grüne |
| elsnervongronow | 1187 | AfD |
| tschipanski | 1006 | CDU/CSU |
| ullinissen | 991 | SPD |
| andi_wagner | 969 | DIE LINKE. |
| gtzfrmming | 868 | AfD |
| djanecek | 834 | Grüne |
| steinekecdu | 807 | CDU/CSU |
retweet_network %N>%
mutate(
in_degree = centrality_degree(mode = "in"),
# out_degree = centrality_degree(mode = "out")
) %>%
as_tibble() %>%
ggplot(aes(x = in_degree)) +
geom_histogram() +
labs(x = "In-Degree") +
theme_custom()retweet_network %N>%
filter(!party %in% c("Fraktionslos", NA)) %>%
mutate(
in_degree = centrality_degree(mode = "in"),
# out_degree = centrality_degree(mode = "out")
) %>%
as_tibble() %>%
ggplot(aes(x = in_degree, fill = party)) +
geom_histogram() +
labs(x = "In-Degree") +
facet_wrap(~ party, scales = "free") +
scale_fill_manual(values = party_colors) +
theme_custom()The in-degree distribution seems to be a bit more normally distributed, meaning that “inverse hubs”, or MPs that heavily retweeted other MPs, are also present but not as dominant as retweeted MP hubs. However, the AfD seems to dominate the retweet dynamic from this perspective, as the table above shows.
Comparing Out- and In-degree
In- and out-degree seem to be correlated (\(R = 0.53\)), however, there are outliers that influence the relationship heavily.
# %N>% activate nodes (tidygraph syntax)
retweet_network %N>%
mutate(
in_degree = centrality_degree(mode = "in"),
out_degree = centrality_degree(mode = "out")
) %>%
select(name, in_degree, out_degree) %>%
as_tibble() %>%
left_join(party_user, by = c("name" = "tw_id")) %>%
ggplot(aes(out_degree, in_degree)) +
geom_point(alpha = 0.5) +
ggpubr::stat_cor() +
theme_custom()It seems that there are a few outliers at the upper end of the distribution. Let’s see whether the correlation holds when they are removed.
The correlation is still quite high (\(R = 0.38\)) and highly significant, when we remove the MPs with the top 20 highest in- and out-degrees.
top_20_in <- retweet_network %>%
mutate(in_degree = centrality_degree(mode = "in")) %>%
arrange(desc(in_degree)) %>%
as_tibble() %>%
top_n(20) %>%
pull(name)
top_20_out <- retweet_network %>%
mutate(out_degree = centrality_degree(mode = "out")) %>%
arrange(desc(out_degree)) %>%
as_tibble() %>%
top_n(20) %>%
pull(name)
# %N>% activate nodes (tidygraph syntax)
retweet_network %N>%
mutate(
in_degree = centrality_degree(mode = "in"),
out_degree = centrality_degree(mode = "out")
) %>%
select(name, in_degree, out_degree) %>%
as_tibble() %>%
left_join(party_user, by = c("name" = "tw_id")) %>%
filter(!name %in% top_20_in, !name %in% top_20_out) %>%
ggplot(aes(out_degree, in_degree)) +
geom_point(alpha = 0.5) +
ggpubr::stat_cor() +
theme_custom()retweet_network %N>%
filter(!party %in% c("Fraktionslos", NA)) %>%
mutate(
in_degree = centrality_degree(mode = "in"),
out_degree = centrality_degree(mode = "out")
) %>%
select(name, in_degree, out_degree) %>%
as_tibble() %>%
left_join(party_user, by = c("name" = "tw_id")) %>%
filter(!name %in% top_20_in, !name %in% top_20_out) %>%
ggplot(aes(out_degree, in_degree)) +
geom_point(alpha = 0.5, aes(color = party)) +
ggpubr::stat_cor(show.legend = F) +
facet_wrap(~ party) +
scale_color_manual(values = party_colors) +
theme_custom()Distance / Shortest Path Length
Most of the users are connected across (a path length of) 2 or 3 retweets.
distances_retweet_network <- retweet_network %>%
distances() %>%
table() %>%
as_tibble() %>%
rename(Distance = 1)
distances_retweet_network %>%
knitr::kable()| Distance | n |
|---|---|
| 0 | 487 |
| 1 | 18554 |
| 2 | 105456 |
| 3 | 101042 |
| 4 | 11290 |
| 5 | 332 |
| 6 | 8 |
distances_retweet_network %>%
ggplot(aes(x = Distance, y = n)) + geom_col() +
ggtitle("Distribution of Distance (Shortest Path)") +
theme_custom()Network Visualization
Create a new custom theme, based on ggraph::theme_graph().
theme_graph_custom <- function() {
font <- "Corbel"
# base theme
theme_graph() %+replace%
theme(
text = element_text(family = font),
legend.text = element_text(size = 9),
legend.title = element_blank(),
legend.box.background = element_rect(
colour = "black",
fill = "white",
linetype = "solid"
),
# faceting
strip.background = element_blank(),
strip.text = element_text(color = "black"),
)
}Standard Plot
Using ggraph auto/default options to plot all nodes and edges, we can see that there is a strong pattern of “intra-party retweeting”. Interestingly, while most parties members exhibits almost exclusively retweet within the party, there is a strong visual intersection of SPD and Left Party retweets. To a lesser extent, this also repeats for party of the Greens with the SPD
plot_all_auto <- retweet_network %>%
ggraph(layout = "auto") +
geom_edge_link(alpha = 0.015) +
geom_node_point(aes(color = party)) +
scale_color_manual(values = party_colors) +
theme_graph_custom()
plot_all_autoCircular
Now let’s visualize the network in a circular way, highlighting the retweet dynamics in a way that underlines the party structure of the dataset.
plot_all_circular <- retweet_network %>%
filter(!party %in% c("Fraktionslos", NA)) %>%
ggraph(layout = "linear", circular = TRUE) +
geom_edge_arc(aes(color = party_from), alpha = 0.015) +
scale_edge_color_manual(values = party_colors) +
coord_fixed() +
theme_graph_custom() +
theme(legend.position = "none")
plot_all_circularFurther Centrality Measures
Closeness Centrality
The Closeness Centrality is defined as the inverse total distance of paths between a node and all other nodes in the network. Shorter paths to others are thus an indicator of node centrality.
closeness_retweet_network <- retweet_network %>%
mutate(Closeness = centrality_closeness())The Closeness Centrality is not dominated by the AfD, which was exhibiting the MPs with the highest degrees. This finding underlines that centrality measures that go beyond the degree centrality play an important role when assessing the actual dynamics of the dataset.
closeness_retweet_network %>%
filter(Closeness != 1) %>%
arrange(desc(Closeness)) %>%
select(name, Closeness, party) %>%
as_tibble() %>%
head(20) %>%
knitr::kable()| name | Closeness | party |
|---|---|---|
| konstantinkuhle | 0.0011416 | FDP |
| karl_lauterbach | 0.0010776 | SPD |
| brihasselmann | 0.0010672 | Grüne |
| marcobuschmann | 0.0010616 | FDP |
| heikomaas | 0.0010604 | SPD |
| larsklingbeil | 0.0010504 | SPD |
| johannesvogel | 0.0010194 | FDP |
| rkiesewetter | 0.0010194 | CDU/CSU |
| renatekuenast | 0.0010183 | Grüne |
| thomasoppermann | 0.0010183 | SPD |
| mgrossebroemer | 0.0010163 | CDU/CSU |
| c_lindner | 0.0010152 | FDP |
| konstantinnotz | 0.0010152 | Grüne |
| schneidercar | 0.0010121 | SPD |
| katarinabarley | 0.0010121 | SPD |
| nouripour | 0.0010101 | Grüne |
| mastrackzi | 0.0010091 | FDP |
| sigmargabriel | 0.0010091 | SPD |
| lambsdorff | 0.0010070 | FDP |
| cem_oezdemir | 0.0010040 | Grüne |
The closeness centrality seems to be actually normally distributed, while there are some outliers on the lower bound.
closeness_retweet_network %>%
filter(Closeness != 1) %>%
as_tibble() %>%
ggplot(aes(x = Closeness)) +
geom_density() +
theme_custom()While most parties have an approximately equally distributed Closeness Centrality, the AfD has the lowest one with a visually perceivable difference. For interpretational reasons, this is likelly due to internal heterogeneity, given the populist character and short existence period of the party.
closeness_retweet_network %>%
filter(!party %in% c("Fraktionslos", NA)) %>%
filter(Closeness != 1) %>%
as_tibble() %>%
ggplot(aes(x = Closeness, y = reorder(party, Closeness), fill = party)) +
ggridges::stat_density_ridges(alpha = 0.5, quantile_lines = T, quantiles = 4) +
scale_fill_manual(values = party_colors) +
theme_custom() +
labs(y = "")Betweenness Centrality
Next to Closeness, there is Betweenness Centrality. Betweenness is defined as the number of shortest paths that go through a node (divided by all shortest paths). It is thus also a measure of Centrality, but with a different statistical focus.
betweenness_retweet_network <- retweet_network %>%
mutate(Betweenness = centrality_betweenness())betweenness_retweet_network %>%
arrange(desc(Betweenness)) %>%
select(name, Betweenness, party) %>%
as_tibble() %>%
head(20) %>%
knitr::kable()| name | Betweenness | party |
|---|---|---|
| tschipanski | 26989.810 | CDU/CSU |
| s_muenzenmaier | 19464.496 | AfD |
| stbrandner | 18131.114 | AfD |
| olliluksic | 14463.042 | FDP |
| rkiesewetter | 12089.569 | CDU/CSU |
| _martinneumann | 8452.709 | FDP |
| fabiodemasi | 8202.163 | DIE LINKE. |
| joanacotar | 8010.589 | AfD |
| hubertus_heil | 7998.812 | SPD |
| c_lindner | 7090.924 | FDP |
| gtzfrmming | 6647.937 | AfD |
| timon_gremmels | 6566.856 | SPD |
| wanderwitz | 6504.270 | CDU/CSU |
| matthiashauer | 6361.569 | CDU/CSU |
| udohemmelgarn | 6303.254 | AfD |
| mgrossebroemer | 6059.420 | CDU/CSU |
| sven_kindler | 5791.922 | Grüne |
| larsklingbeil | 5217.861 | SPD |
| drandreasnick | 4975.422 | CDU/CSU |
| konstantinnotz | 4778.380 | Grüne |
The Betweenness distribution is heavily skewed. There are only 5 MPs that have a Betweenness > 10000.
betweenness_retweet_network %>%
as_tibble() %>%
ggplot(aes(x = Betweenness)) +
geom_histogram() +
theme_custom()For matters of visualization, it thus makes sense to plot the log of the Betweenness.
betweenness_retweet_network %>%
as_tibble() %>%
ggplot(aes(x = Betweenness)) +
geom_histogram() +
scale_x_log10() +
theme_custom()After seeing the log distribution, it is easier to compare the distribution across the parties. While most parties have an unimodal distribution, the AfD has a flatter distribution with a lower average betweenness and no real peak. This confirms the finding of the applied Closeness Centrality.
betweenness_retweet_network %>%
filter(!party %in% c("Fraktionslos", NA)) %>%
filter(Betweenness != 1) %>%
as_tibble() %>%
ggplot(aes(x = Betweenness, color = party)) +
facet_wrap(~ party) +
geom_density() +
scale_x_log10() +
# geom_density() +<
scale_color_manual(values = party_colors) +
theme_custom() +
labs(y = "")Eigenvector Centrality
Eigenvector Centrality takes into consideration the whole network when assessing centrality. The assigned centrality depends of a node depends on the centrality of neighbours
eigen_retweet_network <- retweet_network %>%
mutate(eigen = centrality_eigen())Interestingly, the AfD dominates the Eigenvector Centrality. Given the retweet dynamic that most AfD MPs seem to retweet a lot within the party, but not just to certain central hubs, this is an intersting finding.
eigen_retweet_network %>%
filter(eigen != 1) %>%
arrange(desc(eigen)) %>%
select(name, eigen, party) %>%
as_tibble() %>%
head(20) %>%
knitr::kable()| name | eigen | party |
|---|---|---|
| udohemmelgarn | 0.7522489 | AfD |
| schneider_afd | 0.4050560 | AfD |
| renner_afd | 0.3067681 | AfD |
| alice_weidel | 0.2783748 | AfD |
| nicole_hoechst | 0.2208869 | AfD |
| m_reichardt_afd | 0.1844001 | AfD |
| elsnervongronow | 0.1345158 | AfD |
| stbrandner | 0.1104824 | AfD |
| beatrix_vstorch | 0.0965541 | AfD |
| gtzfrmming | 0.0935327 | AfD |
| tino_chrupalla | 0.0607585 | AfD |
| stefankeuterafd | 0.0543826 | AfD |
| r_hartwig_afd | 0.0441641 | AfD |
| joanacotar | 0.0410962 | AfD |
| s_muenzenmaier | 0.0379447 | AfD |
| buettner_sdl | 0.0290164 | AfD |
| DrMEspendiller | 0.0274493 | AfD |
| petrbystronafd | 0.0267201 | AfD |
| gottfriedcurio | 0.0208919 | AfD |
| martin_hess_mdb | 0.0198649 | AfD |
The dynamic is also repeated visually: Most parties have an approximately normally distributed Eigenvector Centrality, while the AfD has a peak on the right side (“left skewed”).
eigen_retweet_network %>%
filter(!party %in% c("Fraktionslos", NA)) %>%
filter(eigen != 1) %>%
as_tibble() %>%
ggplot(aes(x = eigen, color = party)) +
facet_wrap(~ party) +
geom_density() +
scale_x_log10() +
# geom_density() +<
scale_color_manual(values = party_colors) +
theme_custom() +
labs(y = "")Community Detection
For substantive reasons, community detection is an interest approach given the already existing party clusters, that are visible when plotting the whole network. In addition, since we are dealing with a directed network, not every type of community detection algorithm is appropriate.
Components
Let’s start with the basic components detection.
group_comp_retweet_network <- retweet_network %>%
mutate(
group_comp_weak = group_components("weak"),
group_comp_strong = group_components("strong")
)Applying the “weak” component algorithm yields a redundant community detection: Every node is part of the same community
group_comp_retweet_network %>% distinct(group_comp_weak)# A tbl_graph: 1 nodes and 5 edges
#
# A directed multigraph with 1 component
#
# Node Data: 1 × 1 (active)
group_comp_weak
<int>
1 1
#
# Edge Data: 5 × 4
from to name_from party_from
<int> <int> <chr> <chr>
1 1 1 katjakipping DIE LINKE.
2 1 1 katjakipping DIE LINKE.
3 1 1 katjakipping DIE LINKE.
# … with 2 more rows
However, when applying the “strong” definition, there are 42 distinct communities detected. “Strong” refers to detecting cliques, meaning that every of those 42 subgroups are fully connected within.
group_comp_retweet_network %>% distinct(group_comp_strong) # A tbl_graph: 42 nodes and 21 edges
#
# A directed multigraph with 40 components
#
# Node Data: 42 × 1 (active)
group_comp_strong
<int>
1 1
2 15
3 13
4 12
5 11
6 40
# … with 36 more rows
#
# Edge Data: 21 × 4
from to name_from party_from
<int> <int> <chr> <chr>
1 3 19 edrossmann SPD
2 6 6 siegbertdroese AfD
3 8 8 frankeedgar SPD
# … with 18 more rows
Let’s now have a look on how the found components correspond to the party structure.
group_comp_retweet_network %>%
filter(!party %in% c("Fraktionslos", NA)) %>%
as_tibble() %>%
group_by(party, group_comp_strong) %>%
count() %>%
arrange((desc(n))) %>%
knitr::kable()| party | group_comp_strong | n |
|---|---|---|
| CDU/CSU | 1 | 101 |
| SPD | 1 | 86 |
| FDP | 1 | 80 |
| AfD | 1 | 63 |
| Grüne | 1 | 59 |
| DIE LINKE. | 1 | 53 |
| AfD | 8 | 1 |
| AfD | 9 | 1 |
| AfD | 11 | 1 |
| AfD | 38 | 1 |
| AfD | 39 | 1 |
| AfD | 40 | 1 |
| AfD | 41 | 1 |
| AfD | 42 | 1 |
| CDU/CSU | 2 | 1 |
| CDU/CSU | 3 | 1 |
| CDU/CSU | 5 | 1 |
| CDU/CSU | 6 | 1 |
| CDU/CSU | 10 | 1 |
| CDU/CSU | 12 | 1 |
| CDU/CSU | 15 | 1 |
| CDU/CSU | 18 | 1 |
| CDU/CSU | 19 | 1 |
| CDU/CSU | 20 | 1 |
| CDU/CSU | 22 | 1 |
| CDU/CSU | 23 | 1 |
| CDU/CSU | 24 | 1 |
| CDU/CSU | 25 | 1 |
| CDU/CSU | 26 | 1 |
| CDU/CSU | 34 | 1 |
| CDU/CSU | 35 | 1 |
| CDU/CSU | 36 | 1 |
| DIE LINKE. | 16 | 1 |
| FDP | 17 | 1 |
| Grüne | 37 | 1 |
| SPD | 4 | 1 |
| SPD | 7 | 1 |
| SPD | 13 | 1 |
| SPD | 14 | 1 |
| SPD | 21 | 1 |
| SPD | 27 | 1 |
| SPD | 28 | 1 |
| SPD | 29 | 1 |
| SPD | 30 | 1 |
| SPD | 31 | 1 |
| SPD | 32 | 1 |
| SPD | 33 | 1 |
Turns out, that most parties are part of group 1, by far. The algorithm is not really informative as well.
group_comp_retweet_network %>%
filter(!party %in% c("Fraktionslos", NA)) %>%
as_tibble() %>%
ggplot(aes(x = group_comp_strong, fill = party)) +
facet_wrap(~ party) +
geom_bar() +
scale_fill_manual(values = party_colors) +
theme_custom() Infomap
As the components did not yield sufficient discrimiant communities, let’s try out other algorithms. Infomap follows the random walker approach.
set.seed(42)
group_infomap_retweet_network <- retweet_network %>%
mutate(
group_infomap = group_infomap()
)It seems to work, at least partially! Especially the AfD communities are detected efficiently and thus separated from the other parties. However, a validated finding would need a sufficient amount of trials and tuning to get rid of the random component.
The parties are differently present across the groups, which seems satisfies the discrimant goal.
group_infomap_retweet_network %>%
filter(!party %in% c("Fraktionslos", NA)) %>%
as_tibble() %>%
group_by(party, group_infomap) %>%
count() %>%
arrange(party, desc(n)) %>%
group_by(party) %>%
slice(1:3) %>%
knitr::kable(caption = "Top 3 Infomap Groups by Party")| party | group_infomap | n |
|---|---|---|
| AfD | 4 | 22 |
| AfD | 6 | 21 |
| AfD | 3 | 10 |
| CDU/CSU | 2 | 16 |
| CDU/CSU | 12 | 14 |
| CDU/CSU | 10 | 12 |
| DIE LINKE. | 8 | 13 |
| DIE LINKE. | 23 | 9 |
| DIE LINKE. | 28 | 7 |
| FDP | 7 | 11 |
| FDP | 1 | 8 |
| FDP | 5 | 8 |
| Grüne | 24 | 9 |
| Grüne | 1 | 7 |
| Grüne | 33 | 6 |
| SPD | 9 | 9 |
| SPD | 20 | 9 |
| SPD | 22 | 9 |
group_infomap_retweet_network %>%
filter(!party %in% c("Fraktionslos", NA)) %>%
as_tibble() %>%
ggplot(aes(x = group_infomap, fill = party)) +
facet_wrap(~ party) +
geom_bar() +
scale_fill_manual(values = party_colors) +
theme_custom() Further Work
For the last assignment, it would be interesting to dive in a bit more into the correspondence of certain community detection and clustering algorithms and check their correspondence with the party structure. In addition, given that we are dealing with social media data dynamics, there is a lot of social media context data available. Every Tweet has a certain number of likes and comments attached that could serve as weights. Furthermore, every profile exhibits a number of followers that indicate the social media relevance. It would be interesting to check whether and which type of centrality is statistically associated with the number of followers.